Talk about unique device IDs

Device ID is simply a string of symbols (or numbers) that maps real hardware devices.

If these symbols correspond to the device one-to-one, it can be called "Unique Device Identifier"

Unfortunately, for the Android platform, there is no A stable API allows developers to obtain such device IDs.

Developers usually encounter this dilemma:

As the project evolves, device IDs are needed in more and more places;

However, As the Android version upgrades, it becomes increasingly difficult to obtain the device ID.

Coupled with the fragmentation of the Android platform, the road to obtaining device ID can be said to be difficult.

Regarding the role of device ID, it can be roughly divided into the following points:

There are only a handful of APIs for obtaining device identification, and all of them have more or less problems.

Conventional APIs include the following:

IMEI should be the ideal device ID, unique, and will not change when restoring factory settings (relevant to real devices).

However, obtaining the IMEI requires the READ_PHONE_STATE permission. I guess everyone knows how troublesome this permission is.

Especially after Android 6.0, such permissions need to be applied for dynamically, and many users may choose to refuse authorization.

We have seen that some APPs cannot be used without granting this permission, which may reduce users' favorability of the APP.

Moreover, Android 10.0 will completely prohibit third-party applications from obtaining the device's IMEI, even if the READ_PHONE_STATE permission is applied for.

Therefore, if it is a new APP, it is not recommended to use IMEI as the device identification;

If you have already used IMEI as the identification, you must quickly do compatibility work, especially the new device identification and IMEI mapping.

Obtained through android.os.Build.SERIAL, provided by the manufacturer.

If the manufacturer is more standardized, the device serial number + Build.MANUFACTURER should be able to uniquely identify the device.

But the reality is that not all manufacturers follow the specifications, especially early equipment.

The most fatal thing is that on Android 8.0 and above, android.os.Build.SERIAL always returns "unknown";

To obtain the serial number, you can call Build.getSerial(), But you need to apply for READ_PHONE_STATE permission.

After Android 10.0 and above, like IMEI, it is also prohibited to obtain.

Generally speaking, device serial numbers are a bit useless: they are tasteless to eat and a pity to throw away.

It is becoming more and more difficult to obtain the MAC address.

After Android 6.0, the mac obtained through WifiManager will be fixed: 02:00:00:00:00:00.

Later, I could not even read /sys/class/net/wlan0/address.

Now there is only the following method that can be obtained (it can be obtained without turning on wifi):

It is not certain that this method will not work in the future, and it will be used. Cherish~

Android ID has the lowest threshold for obtaining, does not require any permissions, has a 64-bit value range, and is very unique.

But the shortcomings are also obvious:

1. Flashing, rooting, restoring factory settings, etc. will cause the Android ID to change;

2. After Android 8.0 , the rules of Android ID have changed:

The result of the two rules is:

First, if the user installs the APP on a device below 8.0, then uninstalls it, and upgrades to 8.0 After reinstalling the app, the Android ID is different;

Secondly, apps with different signatures obtain different Android IDs.

The second point may have an impact on advertising networks and the like.

The author previously wrote an article "Acquisition and Construction of Unique Identification of Android Devices", which mentioned two concepts of device ID: uniqueness and stability.

Uniqueness: Two different devices obtain different device IDs;

Stability: The same device obtains the same device ID at different times.

To analyze uniqueness, we can start with the allocation of IDs:

The first column on the left is the number of bits, the second column is the corresponding value range, and then the number of elements. number and the corresponding conflict probability.

For example, if there are 50,000 32-bit random numbers, the probability that at least two of these random numbers have the same number is 25%;

In other words, if There are four groups of numbers, each group has 50,000 random numbers, then approximately one group will have repeated numbers.

32bit has a value range of 4.3 billion. How come 50,000 random numbers have such a high probability of repetition?

If you are confused about this, you can learn about the birthday paradox .

Android ID is a hexadecimal string with a length of 16, which is actually 64 bit. Let’s analyze the probability of its repetition:

If the cumulative activation volume of the APP reaches 5 billion APP, then approximately one of every two such APPs will have a duplicate Android ID.

However, the "repetition" here is not a large number of repetitions, but "at least two of the same". That is, if the number of device activations is 5 billion (many APPs cannot reach -_-), then There may be a small number of duplicate Android IDs.

Overall, the uniqueness of Android ID is good.

JDK’s randomUUID can be roughly considered as a 128-bit random number (6 bits of which are fixed). Even if the number reaches 20 billion, the probability of repetition is only 10 to the minus 18th power, which is very small. .

Stability has two levels:

Whether it is statistical requirements or business requirements, the device ID is required to be unique and stable.

If the device ID is repeated, activity statistics, user portraits, targeted push, etc. will all be inaccurate;

Among them, the most profound impact is on targeted push, and it is possible to send the wrong express delivery When it comes to recalling, it’s hard to say if the push is wrong. If the content pushed is important, the consequences will be disastrous.

If the device ID is unstable (ID changes), it will affect the activity statistics (it will be considered a new user), and it will also have a great impact on the user portrait (the behavioral data associated with the previous ID cannot be tracked).

To this end, it is necessary to design a solution to provide a relatively stable and unique device ID.

First of all, two prerequisites must be clarified:

Among the device IDs analyzed earlier, if available, the probability of duplication is small;

If Observe at a certain frequency, such as every day. Generally speaking, the probability of observing something different from yesterday is also small.

How to continue to reduce the probability on the premise that the probability is already small?

One solution is to combine the device IDs (concatenate directly, or calculate the digest after concatenation).

For example, if the probability of duplication and the probability of change are both one thousandth,

then for two different devices, the two device IDs are repeated at the same time. The probability is one in a million, and about two in a thousand that at least one of the two device IDs will change.

In other words, the effect of splicing ID is to greatly improve the uniqueness, but reduce the stability to a certain extent (as long as one of the elements changes, the spliced ??ID will change).

But in fact, the most prominent contradiction of the device IDs available today is instability. Therefore, we cannot sacrifice stability in order to improve uniqueness.

To improve stability, a fault-tolerant scheme can be introduced.

There are many fault-tolerant solutions, such as network transmission, using checksum to verify the message, and resend it if an error occurs;

Another example is a disk array, where data is written to two disks. Data will be lost only when two disks fail at the same time, thus greatly reducing the probability of losing data.

But for device ID, the above two solutions are not suitable, because the above solution requires checksum to confirm whether the original information has been modified, and device ID does not have such conditions.

Therefore, a "Byzantine fault tolerance" solution similar to that used in virtual currencies can be introduced.

To put it simply, three device IDs are collected to the cloud. If two (including more than two) device IDs are the same as the previous record, it is considered to be the same device.

Also assuming that the probability of duplication and the probability of change are both one thousandth, then:

Two collections of the same device cannot be recognized as the same device. The condition is "at least two device IDs are different from last time", and the probability is about three millionths.

The condition for two different devices to be considered the same is that "among the three device IDs, at least two device IDs are the same as another device", and the probability is also about one millionth three.

Therefore, using this solution, both uniqueness and stability can be improved.

The basic idea is: the server has a table of device IDs, and the core attributes (Column) are:

id | did_1 | did_2 | did_3

Client When requesting, upload three device IDs and retrieve them on the server side:

If a record is retrieved and at least two of the did's are the same as those uploaded, return the ID;

Otherwise, insert the upload of the three device IDs, and return the id of the newly inserted record.

Normally, the primary key of the server table is an auto-increment sequence (in order to ensure the orderliness of insertion),

Therefore, we cannot directly return the primary key of the table, otherwise it will easily be used by others. Infer other device IDs and know the number of users.

Therefore, in addition to the primary key ID, we need another unique ID.

There are two ideas:

Then, three device IDs are needed...

Then, if there is no READ_PHONE_STATE permission, and Android 10.0 After that, what to do?

First of all, the device serial number still needs to be collected. After all, there are still some old versions of the device that can be obtained, and you can distinguish one point from another;

Then, collect some device-related information. , model, hardware information, etc. (the same model may have multiple configurations, so hardware information is also collected at the same time).

The final matching rules are as follows:

If there is no matching device, it is considered a new device;

At this time, a new udid is generated and returned, and a new device is inserted at the same time Device related information (device ID, hardware information).

Regarding hardware information, one requirement needs to be met: it will not change after the device is restarted, restored to factory settings, etc.

General information includes the number of CPU cores, RAM/ROM size (collected in Gb units, not accurate to bits, otherwise it is easy to change), screen resolution and dpi, etc., combined with the model, a conservative estimate is There are thousands or even tens of thousands of possibilities, which is certainly far behind the 2^64 of Android ID, but it can still be used as auxiliary reference information.

Imagine that when the device serial number cannot be obtained and one of the Android ID and MAC address changes, all the records retrieved are only matching records of Android ID or MAC. It is impossible to say for sure in the vast sea of ??machines. There are one or two devices with the same Android ID or MAC.

Which one should you choose at this time? Coupled with device information, perhaps the distinction can be made.

Regular device information is easily tampered with, so in addition to regular information, we can mine some unpopular device features, such as NetworkInterface and sensor related information.

When the general information is tampered with, if the unpopular device information has not changed, it can still be identified as the same device.

As for how to mine, everyone has their own unique skills. Usually friends who make mobile phone hardware or ROM may know more APIs.

In order to facilitate retrieval, we can use MurmurHash to compress the information to 64bit (Long length).

Furthermore, after obtaining the udid, you can upload the udid and device information to the cloud regularly (for example, every two days). The cloud compares the stored information with the uploaded information, and updates it if they are not the same. This can improve the stability of udid.

For example, if the user uninstalled the APP when the device was Android 7.0 and installed it back after Android 8.0, the Android ID has changed at this time, but we can recognize it based on the MAC and device information. device and update its Android ID at the same time;

If one day it is the turn of the MAC to be unavailable, we can still identify the device based on the Android ID and device information.

This article introduces the use and current situation of device IDs, analyzes the characteristics of existing device IDs, and finally proposes a set of device ID construction plans.

According to the trend in recent years, the APIs for various device IDs may become tighter and tighter. It is difficult to construct reliable device IDs only from the client, and based on information collection and cloud integration The calculation is relatively easy.

For the specific implementation, the author has written a Demo, which has been released to github for reference only.

Project address:

https://en.wikipedia.org/wiki/Birthday_attack

Qq Talk about Daquan Sadness (113 sentences)

Who do you think is suitable to play Gu Yun in kill the wolf after it is made into a film?

Liao basket won't sign Meijeri. Why did Real Madrid officially announce the loan of Meijeri to Liao basket, and the fans bluntly said that the water was too deep?

There are four copies of the EMS mailing list. How should I fill it out? Which copy should be affixed to the envelope?

Where does Anyang disability appraisal do?

Who has Wang Shuo's "I am your father"! ! !

Are you cheerful and withdrawn at the same time?

How to close the little secret visa-free function of Everbright Bank credit card?

First prize for music manuscript submission

20208 Electronic payment voucher failed to be stamped