List

SPSSAU Data Format

Data is the foundation of research methods. This document outlines the data format requirements for common research methods in SPSSAU, including ANOVA, t test, paired t test, multiple-choice question, chi-square test, repeated measures ANOVA, fuzzy comprehensive evaluation, AHP (Analytic Hierarchy Process), ARIMA time series model, and panel model.

No matter the type of data, it must be organized in a standardized format, including the following:

  • The first row should only contain titles, and it must not be empty; otherwise, titles cannot be dragged for operation.
  • Merged cells are not allowed.
  • There should not be any completely empty rows or columns.
  • Download the SPSSAU data format reference template.

Default data format is as follows:

By default, the data format should be structured such that each row represents one sample, and each column represents one title, as shown above. This format applies to most research methods.

One-Way ANOVA

ANOVA is used to study differences between groups, such as the difference in satisfaction levels among people with different educational backgrounds. Therefore, the data format must include a grouping variable X (e.g., education level) and an analysis variable Y (e.g., satisfaction). Sometimes, there are only analysis variables (e.g., three analysis variables) . To analyze the differences between them, you need to modify the data by adding a "Group" column and reorganizing the data so that each variable Y appears in a single column, as shown below:

Independent-Samples t Test

t Test is used to examine the difference between two groups, such as the difference in satisfaction based on gender. The data format must include a grouping variable X (e.g., gender) and an analysis variable Y (e.g., satisfaction).

Sometimes, the data only has two columns without a grouping variable, such as the experimental group and the control group. In this case, the data should be modified by adding a "group" column and stacking the data to form the analysis item Y, as shown below:

Paired t test

The format for paired data is relatively special, including paired t test, paired chi-square test, etc. For example, the difference between the experimental and control group data, as shown above.

Multiple Choice

In survey research, multiple-choice questions are used, and their data format is special. For example, if a multiple-choice question has 4 options, there will be 4 columns in the data, each representing one option. 1 indicates the option is selected and 0 indicates it is not selected. When conducting research using multiple-choice questions, all the options for the same multiple-choice question must be included in the analysis frame.

Chi-Square Test

Chi Square test is used to examine the differences between X and Y, where both X and Y are nominal data. When using the Chi-Square Test in Experimental/Medical Research, SPSSAU supports two types of data formats: the regular format (non-weighted format) and the weighted data format. The figure above shows the regular format (non-weighted), where one row represents a sample and one column represents an attribute. All the original data information should be listed.

The figure below shows the weighted data format. In medical or experimental research, summary data, which includes weighted items, is often used. For example, in the figure below, X has 2 categories and Y has 3 categories, resulting in 2×3=6 combinations. The data includes only the summary items for these 6 combinations (i.e., the weighted data), which are 40, 10, 20, 30, 20, and 50. This represents a total of 170 samples. If the regular format (non-weighted format) were used, there would be 170 rows of data. However, with the weighted format, only 6 rows are needed to represent the data, as shown in the figure below.

The weighted data format is primarily used when all data are nominal (categorical). SPSSAU supports both the regular and weighted data formats. The regular format provides all raw data, while the weighted format only provides summarized data. The following methods in SPSSAU support the weighted data format:

  • [Visualization] Word Cloud
  • [Survey Research] Correspondence Analysis
  • [Experimental/Medical Research] Chi-Square Test
  • [Experimental/Medical Research] Kappa
  • [Experimental/Medical Research] Paired Chi-Square
  • [Experimental/Medical Research] Poisson Regression
  • [Experimental/Medical Research] Ridit Analysis
  • [Experimental/Medical Research] Chi-Square Goodness of Fit
  • [Experimental/Medical Research] Poisson Test

Repeated Measures ANOVA

Repeated measures data refers to data collected multiple times from the same set of samples (or cases) at different time points. The key feature of repeated measures data is that there will always be an ID number (i.e., sample or case number) and time point data. The same ID will have data for multiple time points. For example, in the figure, there are 6 samples (6 ID numbers), and 5 time points are measured. Therefore, there will be 6*5=30 rows of data. Each ID number will be repeated 5 times, and each time point will be repeated 6 times.

Fuzzy Comprehensive Evaluation

In Fuzzy Comprehensive Evaluation, the evaluation criteria should be grouped according to their corresponding evaluation categories. Each evaluation category (e.g., "Unsatisfied," "Somewhat Unsatisfied," "Satisfied," "Very Satisfied") should be placed in its own separate column.

If each evaluation criterion has its own weight, it is necessary to add a separate column for "Weight." The column labeled "Weight" is optional. If this data is not available, SPSSAU assumes that all criteria have equal weights. The column for criteria is for the researcher's reference and does not need to be included in the analysis box.

In the figure above, the numbers in each evaluation category represent the percentage of selections. For example, for Criterion 1, the selection proportion for Evaluation Item 1 is 0.2 (or 20%), and for Evaluation Item 2, the selection proportion is 0.5 (or 50%). Researchers may also input the number of selections rather than the proportion. Whether inputting proportions or the number of selections, SPSSAU will automatically normalize the data so that the proportions for each evaluation item under the same criterion sum to 1.

Analytic Hierarchy Process (AHP)

The data format for AHP, specifically the judgment matrix, is unique. As shown in the figure above, researchers can modify the names of the criteria and the numbers in the white cells. The judgment matrix is a symmetric matrix along its "main diagonal", so when the values in the "white" cells change, the corresponding values in the "blue" cells will automatically adjust.

Autoregressive Integrated Moving Average (ARIMA)

ARIMA is used for time series data analysis, which includes both time and analysis items in two columns. In the figure above, "Year" represents the time variable, and "Sales" represents the actual analysis variable. When analyzing the data, the time variable doesn't need to be explicitly set, but the data should be organized in increasing order of dates (from top to bottom) as shown in the figure. This order is required because the algorithm assumes time will be sorted in ascending order during the analysis.

Panel Model

Panel data is a special format used for analyzing data from multiple subjects (e.g., companies) over time. For example, a study of 100 companies' financial data over 5 years will result in 100*5=500 rows of data.

If all 100 companies have complete 5-year data, it is considered balanced panel data. If some companies have missing data (e.g., only 3 years of data), it is considered unbalanced panel data.

When analyzing panel data in SPSSAU, "Individual ID" (e.g., the stock code of a company or just an identifier) and "Year" (shown as "Year" in the figure) are used together to identify the panel structure. "Individual ID" and "Year" are the key fields that inform the system that the data is panel data. They don't hold additional meaning apart from distinguishing each subject and time period.

Kendall's Coefficient

Kendall's coefficient is used to assess the consistency among multiple evaluators when evaluating several evaluatees. For example, it could be used to evaluate how consistent 4 judges are in scoring 10 contestants. In this case, the evaluators are the 4 judges (represented by 4 columns), and the evaluatees are the 10 contestants (represented by 10 rows).

The data format typically has 1 column for each evaluator and 1 row for each evaluatee. This is the default format in SPSSAU. However, sometimes the data may be transposed, with 1 column for each evaluatee and 1 row for each evaluator. In this case, the parameter should be set to "Evaluator (Row)".

Kappa

The Kappa consistency coefficient is used to assess the consistency between two measures (e.g., two diagnostic methods, two doctors, or two judges) when evaluating subjects (e.g., patients, contestants).

There are two data formats supported by SPSSAU: "weighted" and "unweighted". In the "weighted" format, as shown above, columns A and B represent the two measures (or doctors), and a separate column is used to indicate the number of cases diagnosed by each doctor. For the "weighted" format, it is necessary to include the weighting data in the corresponding cells.

In the "unweighted" format, there is no weight column. Only two columns of raw data are required.

Grey Relational Analysis

Grey Relational Analysis studies the degree of correlation between data, i.e., the relationship between the characteristic sequences and the main sequence. The main sequence is marked by a separate column, and each characteristic sequence is marked by one column. In the figure above, the "Sample Id" is just an identifier with no practical significance and is used to mark the ID number of each sample, which could be, for example, the year. This data is not required.

Entropy Method

The entropy method is used to calculate the weights of indicators. Each indicator occupies one column in the data. In the figure above, ID is just an identifier with no practical significance, used to mark the ID number of each sample, such as the year. This data is not necessary.

If panel data is used with the entropy method, the data format will be like the figure below. For instance, if there are 100 companies with 5 years of indicator data, the total number of rows will be 100*5 = 500. This is the required data format, but during analysis, only the "Indicator" data is needed.

Entropy Weight Topsis

Entropy Weight TOPSIS is used to study how closely the indicators align with the Ideal Solution. Each indicator occupies one column in the data. Each research subject is represented by one row, but this data does not need to be used during analysis. SPSSAU will automatically number the subjects from top to bottom. Alternatively, you can place labels in the Label box.

Technique for Order of Preference by Similarit (TOPSIS)

TOPSIS is used to evaluate the proximity of indicators to the Ideal Solution. Like the entropy-weighted TOPSIS method, each indicator occupies one column in the data. Each research subject occupies one row, but again, this data does not need to be used during analysis. SPSSAU will assign default row numbers unless labels are specified in the Label box.

Weight

Survey Research includes methods like AHP (Analytic Hierarchy Process) and the Entropy method for weight calculation. Each sample occupies one row, and each weight-calculating indicator occupies one column. You can directly use standard survey data for analysis.

ANOVA

ANOVA, whether it's one-way, two-way, three-way, or multivariate, is used to examine the differences between X and Y. Each X occupies one column, and each Y occupies one column. If there are covariates, each covariate occupies one column. The data format is similar to the one shown in the figure above.

Quadrant Plot

A quadrant plot projects data points onto a coordinate system. Data points have both X and Y attributes, so they occupy two columns. Additionally, each data point may have a label, which occupies one column. The "Label" column is optional. If labels are provided, the system will display them accordingly; if not, no labels will be shown. The data format is similar to the one shown in the figure above.

Weighted Rank Sum Ratio (WRSR)

WRSR analyzes the overall performance level of research subjects across different "indicators." The data format requires one column per indicator and one row per research subject, as shown in the figure.

Coupling Coordination Degree

Coupling Coordination Degree analyzes the coordination between different systems. Therefore, one column represents the data of each system, and one row represents each research subject. The data format is similar to the one shown in the figure.

Ridit Analysis

Ridit Analysis studies the differences between X and Y, where X is categorical data and Y is usually interval data. SPSSAU supports both weighted and unweighted formats. For the unweighted format, one row represents one research subject (sample), as shown in the figure above.

For the weighted format, as shown below, if X has 2 conditions and Y has 3 conditions, this results in 2*3 = 6 possible combinations. The data includes 6 summary groups (weighted items) with values: 40, 10, 20, 30, 20, and 50. This is equivalent to a total of 170 samples. If using the unweighted format (i.e., the regular format), there should be 170 rows of data. However, in the weighted format, only 6 rows are required.

Intraclass Correlation Coefficient (ICC)

ICC is used for test-retest reliability analysis. For example, if 3 doctors score the IQ of 5 patients, the consistency between their scores can be analyzed. The data requires 3 columns for the 3 doctors, with each column representing the data of one doctor. The format is similar to paired data, as shown in the figure.

Chi-Square Goodness of Fit

Chi-Square Goodness of Fit analyzes the proportion differences in categorical data. SPSSAU supports two formats: unweighted and weighted.

The unweighted format (shown in the figure) has one row per sample and one column per attribute. All raw data should be listed.

In many cases, only summary data (with weighted items) is available. For example, in the figure below, there are 3 research conditions with sample sizes of 40, 10, and 20, totaling 70 samples. In the unweighted format, there should be 70 rows of data. However, the weighted format only requires 3 rows, as shown below.

Paired Wilcoxon Test

Paired data is typically used in experiments. Its key feature is that the number of rows is always equal and consists of only two columns. If the number of rows is unequal, the data may not be paired, and you may need to use non-parametric tests for comparison. The format for non-parametric tests like the Wilcoxon paired test is different, so pay attention to the correct data format.

Correspondence Analysis (CA)

Correspondence Analysis studies the relationships between multiple categorical data. When using Correspondence Analysis in the Survey Research, SPSSAU supports two data formats: the standard format (unweighted format) and the weighted format.

In many cases, only aggregated data, i.e., data with weights, is available. For example, if the variable X has 2 categories and Y has 3 categories, there will be 2*3=6 combinations, and the data consists of summary items (weighted values) for 6 combinations, which are 40, 10, 20, 30, 20, and 50, respectively. This corresponds to a total of 170 samples. If using the standard format (unweighted), the data would consist of 170 rows, but with the weighted format, only 6 rows are needed, as shown in the figure below.

Kano Model

KANO Model studies the priority of demand for functions/services. Data is typically collected through questionnaires, where each function/service has both positive and negative questions. The model supports only five options, meaning the data consists of five values: 1, 2, 3, 4, and 5. Each row represents one measurement sample, and each column represents one attribute, as shown in the figure above.

Grey Prediction Model

GM(1,1) Model is typically used for predicting with very few samples. If the data includes time, it is not included in the analysis items, but the data should generally be sorted by time before inputting into the system. The data format is similar to the one shown in the figure above.

Generalized Estimating Equations (GEE)

Generalized Estimating Equations (GEE) are used for analyzing longitudinal data, and therefore the data must include a 'Subject ID' column to identify the measurement object's ID number. This ID number typically appears multiple times (e.g., if one person has been measured five times, their ID appears five times), as shown in the figure above.

Poisson Regression

Poisson Regression involves data that includes a baseline, such as the number of people "with cancer", where the baseline (exposure) refers to "the population total of a specific city." The data format is similar to the one shown in the figure above.

Negative Binomial Regression (NBR)

In Negative Binomial Regression, if the data includes a base (e.g., the number of people "with cancer" is Y), and the number of people with cancer is based on a specific city, then the base (exposure) is the "total population of each city." The data format is similar to the one shown above.

Propensity Score Matching (PSM)

Propensity Score Matching (PSM) requires the research variables to only include the numbers 0 and 1, with no special data characteristics required for the feature variables, as shown in the figure above.

Dose Response

Dose Response requires three columns of data: dose, total, and responses. Dose represents the dose level, responses refers to the number of cases responding at that dose level, and total represents the total number of cases at that dose level. The data format is similar to the one shown in the figure above.

Cox Proportional Hazards Regression

Cox Proportional Hazards Regression involves two dependent variables: Y1 for survival time and Y2 for survival status. Y2 must contain only two values, 0 and 1. The data characteristics of X or stratification variables are flexible, and stratification is optional. The data format is similar to the one shown in the figure above.

Kaplan Meier (KM)

Kaplan Meier involves two dependent variables: Y1 for survival time and Y2 for survival status. Y2 must contain only two values, 0 and 1, as shown in the data format above.

Two Stage Least Squares Regression (TSLS)

Two Stage Least Squares Regression (TSLS) involves four types of data: one dependent variable, instrumental variables, endogenous variables, and exogenous variables. The number of instrumental variables must be greater than or equal to the number of endogenous variables. Exogenous variables may or may not be included. The data format is similar to the one shown in the figure above.

Conditional Logistic Regression

Conditional Logistic Regression is typically used in case-control studies, where data has paired observations (e.g., one case matched with several controls, commonly 1:1 or 1:M, where M ≤ 3). In preparing the data, a 'Pair group number' column is needed to identify the pairing. Additionally, the Y value must be 0 or 1. The data format is similar to the one shown in the figure above.

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) involves one column for each indicator and one row for each sample. If panel data is used (e.g., 100 companies over 10 years, resulting in 1000 samples), two additional columns for the company name and year may be necessary to identify the panel format. PCA does not differentiate between panel or non-panel data and analyzes only the indicators. The sample size should generally be five times the number of indicators. The data format is similar to the one shown in the figure above.

Exploratory Factor Analysis (EFA)

Factor Analysis involves one column for each indicator and one row for each sample. For panel data (e.g., 100 companies each observed over 10 years), there are 100*10=1000 samples. It may require two additional columns for company name and year. However, principal component analysis (PCA) does not differentiate whether the data is panel data, as it focuses only on the indicators. Additionally, a typical sample size for PCA should be at least five times the number of indicators, as shown in the figure above.

Net Promoter Score (NPS)

NPS involves scores ranging from 0 to 10, requiring 11 distinct values. If the raw data contains scores from 1 to 12 or other values, use "Data Processing -》Data Recode" to adjust the values before analysis. The data format is similar to the one shown in the figure above.

RFM

RFM requires three columns: recency (R), frequency (F), and monetary value (M). R should be a number rather than a date. If dates are provided, process the data in Excel before uploading it to SPSSAU. The data format is similar to the one shown in the figure above.

Range Analysis

Range Analysis is used for analyzing orthogonal design tables. After completing the experiment, the data from the Excel table (obtained from SPSSAU-》 Orthogonal Experiment) is used. The number of factors is determined by the orthogonal or experimental design. When uploading the data, factor levels must be numbered (e.g., 1, 2, 3). Use "Data Processing-》Data Label" to label the meaning of the numbers. The data format is similar to the one shown in the figure above.

Word Cloud

Word Cloud typically visualizes keywords. SPSSAU supports both weighted and unweighted formats. In the weighted format, the first column shows the keywords, and the second column shows the times (i.e., the weight of each keyword). In the weighted format, the 'Times' should be placed in the corresponding box. In the unweighted format, there is only one column containing the keywords, with many repetitions.

It is also recommended to use Text Analysis for text processing.

Decision Making Trial and Evaluation laboratory (DEMATEL)

For DEMATEL, the data is entered directly into the table (or edited in the table). The data format consists of a square matrix (excluding the header), where the diagonal elements are always zero. The data format is similar to the one shown in the figure above.

Interpretive Structural Model (ISM)

For ISM, the data is entered directly into the table (or edited in the table). The data format consists of a square matrix (excluding the header), where the diagonal elements are always zero. The data format is similar to the one shown in the figure above.

Hierarchical Linear Model (HLM)

For HLM in Medical Research, the data format is as shown in the figure: the group column represents the sample number. It is important to note that this column will have many repeated values, as many samples (e.g., students) belong to the same group (e.g., the same school).

Price Sensitivity Method (PSM)

PSM in Survey Research supports two formats, one based on 'Price' as the header and one based on 'Attitude' as the header. In the 'Price' format, each column represents a price point, and each row represents one sample. In the 'Attitude' format, each column represents one of the four attitude options, and each row represents one sample. The specific format depends on how the questionnaire is designed.

Data Envelopment Analysis (DEA)

For DEA, the data format is as shown below: one column represents DMU (If not, SPSSAU will automatically label the items as Item 1, Item 2, Item 3, etc.) One column represents one indicator (whether input or output).

VIKOR

For VIKOR, the data format is as shown above: one column represents the evaluation object (if not, SPSSAU will automatically label the items as Item 1, Item 2, Item 3, etc.). Each column represents an indicator (whether positive or negative).

Difference-in-Differences (DID)

For DID, the data for Treated and time should only include the digits 0 or 1, with a corresponding dependent variable Y. Control variables are optional and depend on the specifics of the research.

For Multi-period DID, the format is similar to the figure below. Treated should only be either 0 or 1, with 0 indicating the control group and 1 indicating the treated group. Time should also be 0 or 1, with 0 indicating before (pre-treatment) and 1 indicating after (post-treatment). The interaction term Treated * Time can be generated by Data Processing -> Generate Variable -> Product.

Composite Index

For Composite Index, the data format is similar to the above figure: one column for each evaluation item (if not provided or not placed in the 'Label' box, SPSSAU will default to naming them as Item 1, Item 2, Item 3, etc). Each indicator occupies one column.

Obstacle Degree

In Obstacle Degree, each indicator occupies one column. As for the relationship between the indicator and its corresponding "Criterion Stratum", the researcher should manually place them accordingly. Also, the weights for each "Criterion Stratum" and "Indicator Stratum" need to be input. This can be done by clicking the "Weight" button on the right of the "Start".

Regression Discontinuity Design (RDD)

In RDD, each row represents a single sample. The data must include the outcome variable (Y) and the driving variable (X). The Fuzzy model is optional, and control variables can be zero or multiple.

Fisher's Exact Test

For Fisher's Exact Test, the format is in summary format, where the numbers represent the counts of the cross categories. The cell A1 must be empty. For example, in a 2*2=6 structure, 2 represents drug A and drug B, while 3 represents cured, significant, and ineffective. The numbers represent specific counts, such as 12 samples for drug A with ineffective results.

Malmquist Index

For Malmquist Index, which measures panel data efficiency in terms of input-output, the data must be in panel format. For example, if data pertains to 100 companies over 5 years, the dataset will have 500 rows (100*5). One column will be the DMU (company names), and the time column will include the years (e.g., 2020, 2021, 2022, 2023, 2024), with each research indicator occupying one column.

Slack Based Measure (SBM)

For SBM, one column will list the DMU (optional, can be dragged to the "DMU" box), and the remaining columns will hold research indicators, such as input, output, or undesirable output indicators.

Markov Forecasting

For Markov Forecasting, the data typically includes two items: "initial probability" and "state transition matrix". The "initial probability" should be placed in column A. The "state transition matrix" is an n*n matrix, starting from column B. Cell B1 must be empty, as shown above.

Note

If there are 10 states, the "initial probability" will consist of 10 probability values summing to 1, and the "state transition matrix" will be a 10*10 matrix, with each row summing to 1.

Dagum Gini Coefficient

For Dagum Gini Coefficient, a calculation indicator (e.g., per capita GDP) is required. The Group and Time variables may or may not be included, though both are often present. The Group is typically a region, and Time is usually a year. When both are included, the calculation will output results for each time period across the groups.

Theil Index

For Theil Index, it usually involves Group 1, such as a region (highest level), Group 2, and the smallest unit level (e.g., city), with corresponding GDP/population data. Each row represents data for one smallest unit at one time point (usually one year).

Note

If there is only one Group, such as Region->State, Region->City or State->City, the granularity is based on the smallest unit (e.g., State is the smallest unit with 31 states over 5 years, resulting in 155 rows). If there are two levels, such as Region->State->City, the hierarchy is relative, and the smallest granularity unit is the finest-level data, determining the number of rows.

Moran's I Index

For Moran's I Index, the key is to provide the "spatial weight matrix" (highlighted in red), which should be in an n*n format, typically using 0 and 1, where 0 means non-adjacent and 1 means adjacent. The lower triangle of the matrix must be 0, as it represents the distance from a location.

Multidimensional Scaling (MDS)

For MDS, SPSSAU supports two formats: n*n format and raw data format. In the n*n format, the diagonal numbers must be 1, representing self-similarity, while the other numbers represent the distance between items (larger numbers indicate greater distances or dissimilarity).

In raw data format, there will be a column for "Name" and then columns for each indicator, such as cultural similarity between Chinese cities using five indicators: longitude, latitude, average yearly temperature, average yearly rainfall, and average days of sunshine per year (larger numbers represent greater differences).

Conjoint Analysis (CA)

For Conjoint Analysis, typically, an orthogonal experiment is conducted first, and "Profile" (combination of attribute levels for virtual products) is created. A survey is then conducted, and the "score" for each profile are gathered.

When entered into SPSSAU, the system focuses on "score" (Y) and the attribute. The smallest unit in research is the sample ID. For example, with 100 participants, each providing a score for one "profile", the dataset will have 100 rows. If each participant rates two profiles, the dataset will have 200 rows.

META Analysis

For META Analysis, each method has a different format, but the common feature is the inclusion of a Study column. Subgroup variables can be provided directly as text. If covariates exist, they should be placed accordingly in the corresponding Study. If no subgroup or covariate exists, those fields can be left empty.

Apriori Association Analysis

For Apriori Association Analysis, the data format is special. For example, with 1000 shopping orders, each order may contain multiple items. The data is arranged in two columns: the first for Order ID and the second for Products. Each row contains one order's one product. If an order has five products, five rows are created for that order.

Spatial Econometrics

For Spatial Econometrics (including spatial OLS, SLM, SEM, SAC, SDM, SDEM, SLX, and Spatial Panel Model), two datasets are required: spatial weight data and analysis data. Both datasets must be uploaded to SPSSAU. When analyzing, you need to select "Spatial Weight Data" from the drop-down.

"Spatial weight data" is uploaded as a separate data file, where the first row contains the names of entities (e.g., city names). Starting from the second row, the data is in n*n matrix, with each cell representing the spatial weight between two entities. It's essential that the spatial weight matrix is in an n*n format, with the lower-right triangle of the matrix containing zeros, and the matrix must be symmetric.

The order of the "analysis data" must align perfectly with the order of the "spatial weight data."

Spatial Panel Model

For a spatial panel model, such as for 31 cities over 5 years (from 2020 to 2024), the analysis data will be in 31*5 format. However, the spatial weight data will still have 31 rows. The data structure should be ordered as follows:

  • ID = 1, then years 2020, 2021, 2022, 2023, 2024
  • ID = 2, then years 2020, 2021, 2022, 2023, 2024
  • ID = 3, and so on...

In other words, the analysis data must be sorted by ID and time in ascending order. The sequence of IDs in the analysis data must match the sequence in the spatial weight data.

Note:

If the ID is a text value (e.g., the name of a city), SPSSAU will automatically convert it into a numeric code during upload (e.g., New York becomes 3). However, for spatial weight matrices, the system expects the ID to match exactly with the order of spatial weight data. If New York is in the first row of the spatial weight matrix, but its ID has been converted to 3 in the analysis data, this will cause misalignment. Therefore, it is recommended to upload the ID column as "numerical" data (not text).